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ABSTRACT 

This paper presents a new approach to automated identification of human emotions based on analysis of body 
movements, a recognition of gestures and poses. Methodology, models and automated system for emotion identification 
are considered. To characterize the person emotions in the model, body movements are described with linguistic variables 
and a fuzzy hypergraph for temporal events, which are transformed into the expression in a limited natural language. 
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1. INTRODUCTION 

There are a lot of modern information technologies incorporated into the human life such as Internet, 
robotics, games, video monitoring, and so on. The main purpose of these information technologies is to 
improve a human-computer interaction. But, for instance, a replacement of real persons by automated 
systems is impossible without overcoming the barrier of man-machine relationship (Orlova and Rozaliev, 
2011). The inability of machines to recognize and show emotions is one of the impeding factors. The 
development of telecommunication technologies changes the interpersonal communication. Very soon people 
will use virtual communications, which will be more effective and easy to learn but could not express 
emotions. At the same time emotions play a vital role in the human life. Emotions influence on cognitive 
processes (Bernhardt, 2010) and decision making (Petrovsky, 2009). So, it is important to recognize and 
identify the human emotions and use them. 

We developed a new approach to the identification of human emotions that is based on description and 
analysis of body movements, recognition of gestures and postures specific for the emotional states. In this 
paper, we present the methodology, models and the automated system, which are realized the suggested 
approach. 


2. IDENTIFICATION OF HUMAN BODY MOVEMENTS 

The process of identifying human emotional response is based on the idea of how the human manifests 
his/her emotions (Ilyin, 2008; Rozaliev and Zaboleeva-Zotova, 2010). 

Now various companies are actively developing automated systems for recognition, identification and 
transmission of emotional reactions. Many of these systems use web solutions based on a model SaaS 
(Software as a Service). There are also different ways for determining emotional states such as by voice, 
facial expression, body movements, physiological parameters, and so on. (Bernhardt, 2010; Coulson M, 
2004; Hadjikhani and Gelder, 2003; Laban and Ullmann,1988) 

The proposed approach to emotion identification are based on recognition and analysis of human gestures 
and poses. (Zaboleeva-Zotova et al, 2011b). 
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First of all, we recognize a person on video images using a technology for markerless motion capture with 
the digital video camera Microsoft Kinect. Video pictures are presented in the special animation format - the 
BVH-file, which describes poses of body skeleton and contains motion data. Such technology allows 
visualizing and analyzing different movements of person, determining areas of static or dynamic postures of 
micro and macro movements. 

To detect the borders of movements, the motor activity of person is analyzed. For the separation of 
postures, we suggest a special notion of activity, which depends on what part of body performs the 
movement. We describe the typical body movements with linguistic variables and fuzzy hypergraphs for 
temporal events, and transform these descriptions into the expressions in a limited natural language, which 
characterize the person emotions. The identification of human emotional reactions such as joy, sadness, 
anger, etc is provided by the detailed analysis of postures, gestures and motions. 
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Figure 1 . The Architecture of System for Identification of Human Emotional Reactions 

The architecture of computer system developed for identification of human emotional reactions is shown 
on Fig. 1 (Zaboleeva-Zotova et al, 2011a). The input of the system are moving images, sound samples and 
handwriting texts. The output of the system is information about the emotional state of the real person, which 
is expressed in a limited natural language. 


3. VECTOR MODEL OF SKELETON 

In order to define human emotional reactions by body movements, we use the vector model of skeleton, 
which is obtained from video information captured with the digital video camera Microsoft Kinect. 

Kinect camera allows obtaining three-dimensional image in all lighting conditions and without any 
requirement to the actor, who is in the frame. Data from Kinect represented as a hierarchy of nodes of the 
human skeleton. Rotation of one joint with the other, is presented in the form of quaternions (the role of the 
rotating vectors perform the bones of the skeleton) and the offset is represented as a 3 dimensional vectors in 
local to each node coordinate system. To obtain BVH-file, we have developed a new method consists of the 
following steps: 1. Getting data from the camera Kinect. 2. Determine the displacement of nodes relative to 
the parent node. 3. Record the hierarchy of key units in accordance with the specifics of the BVH-format. 4. 
Conversion of quaternions to the Euler angles. With the vector model of body movements, presented as a 
BVH-file may work most of the currently existing animation packages. 
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Vector model of the human body is the formalized representation of the movement of the person, where 
as vectors are presented bones of the human skeleton, and the angles between them correspond to the rotation 
angles of the main nodes of the human body in relation to each other. The vector model of skeleton consists 
of 22 nodes, which correspond to different anatomical joints with one, two or three axes of rotation (Fig.2). 



Anatomical 

Node 

part of body 

of vector model 

Cervical spine 

Head 

Neck 

Shoulder joint 

Collar 

Shldr 

Wrist, radiocarpal joint 

Hand 

Ulna, radioulnar joint 

Fore Arm 

Thoracic spine 

Chest 

Abdomen 

The lumbar spine 

Hip 

Waist 

Hip 

Thigh 

Knee-joint 

Shin 

Ankle joint 

Foot 

Metatarsophalangeal and 
interphalangeal joints 

Toe 


Figure 2. The Vector Model of Body Skeleton and Correspondence between Anatomical Parts of Body and Nodes of the 

Vector Model 


Using information on structure of body skeleton presented in the vector model and motion data contained 
in BVH-files, which describe poses of skeleton, we formalize the concept of motor activity of person 
expressed in gestures as follows: 

A(At) = Z n=1 m (T n (At)-k n )- 

Here m is a number of time series describing movement of the body parts, T n (At) is a variation of the n-ih 
time series for the time At, k n is a coefficient that characterizes influence of the body parts on the body 
motion for the n - th time series. 

The influence coefficient can be calculated as the following sum 

kn ~ ^ i=l ( Pi" Qni) 

where i is a index of the body part, j is a number of the moving body parts, q ni is a ratio of the body part in 
the total body mass, p t is a gender coefficient of proportionality. According to biomechanical studies the 
averaged values of ratio q ni for adults are equal to 6,9% for head, 15,9% for the upper section of trunk, 2,1% 
for shoulder, 16,3% for the middle section of trunk, 1,6% fore forearm, 11,2% for the lower section of trunk, 
0,6% for brush, 14,2% for thigh, 4,3% for lower leg, 1,4% for foot. The gender coefficient p t is equal 
approximately to 1 for all parts of man body, and differs for various parts of woman body. 

Another important characteristic of body movement is a mobility of the joint, which is measured in 
morphology by values of the angles of flexion-extension, abduction-reduction, internal-external rotation as 
follows: 

M j 0int = angle (Fold+ Straightening, Bringing* Abduction, In+Out). 

The maximum spine mobility is a sum of the angles of the left and right rotation around the longitudinal 
axis of the body. 

For automatic separation video districts of the individual poses and gestures, we introduce additional 
parameters, defined by the user: the minimum duration of the movement, the level of activity for poses, the 
level of activity for the movements. Next, we construct a graph of activity and find areas of the postures and 
movements. 

Poses discussed in detail in the works B. Birkenbil, G. Wilson, D. Morrison, A. Pease, were merged into 
granules, based on a similar interpretation. As it is impossible to unequivocally define the current posture 
emotional state of a person, we define the granules, which belong to the posture. 
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This allowed us to increase the reliability of a particular emotional state. Compliance granules poses and 
basic emotional states by K. Izard is shown on Fig. 3. 



Figure 3. Compliance Granules Poses and Basic Emotional States 


4. FORMALIZATION OF HUMAN MOVEMENTS 


In the vector model of skeleton, the movements of human body are described with the linguistic variables, 
which characterize duration of event, variation of rotation angle. The duration of event is measured in the 
frames of video image. The fuzzy temporal variable “Duration of event” includes the following set of terms: 
D 0 ‘Zero’, D : ‘Very short’, D 2 ‘Short’, D 3 ‘Moderate’, D 4 ‘Long’, D 5 ‘Very long’. The membership functions 
of the variable “Duration of event” are presented on Fig. 4. 
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Figure 4. The Membership Functions of the Variable “Duration of Event” 

Each group of joints with the similar values of maximal mobility is presented with the linguistic variable 
“Variation of rotation angle” that consists of the following set of terms: B 0 ‘Stabilization’, B +1 ‘Very slow 
increasing’, B +2 ‘Slow increasing’, B +3 ‘Moderate increasing’, B +4 ‘Fast increasing’, B +5 ‘Very fast 
increasing’, B_j ‘Very slow decreasing’, B_ 2 ‘Slow decreasing’, B 3 ‘Moderate decreasing’, B L* ‘Fast 
decreasing’, B_ 5 ‘Very fast decreasing’. The membership functions of the variable “Variation of rotation 
angle” are presented in the Fig. 5. This linguistic variable can be adjusted on various types of the human 
movements and allow to describe, for instance, the small periodic fluctuations, such as tapping on the table, 
shaking hands or fingers, wiggle from foot to foot, and so on. 
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Figure 5. The Membership Functions of the Variable “Variation of Rotation Angle” 
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By specifying the name of the analyzed part of body, and the range of movements in the vector model of 
skeleton, one can obtain the values of rotation angles of the node relative to one of the axes X, Y or Z, which 
are stored in a separate data array. From this array there is selected a subarray, which contains the values of 
angles p t falling in the range analyzed. The angles, belonging to different frames for the same node, form a 
triangular matrix, which elements is determined by the following rule: Py=Pj~Pi for j>i, Pij=0 for j<i. This 
triangular matrix is used to calculate the values of the membership function of linguistic variable “Variation 
of rotation angle”. 

The movement of the joint around an axis has been described in the form of fuzzy temporal events. Since 
the events are located one after another on the time axis, the motion can be represented as a fuzzy sequential 
temporal sentence (Bernshtein et al, 2009). For example, the variation of the angle of rotation around the axis 
X for the joint “right foot” in the interval [t 4 ; ti 2 ] shown in Fig. 6 can be described as the following series of 
fuzzy temporal statements: “For the right foot there is a very slow decreasing the angle of very short 
duration. This is followed by stabilization of the angle of zero duration. This is followed by a very slow 
increasing the angle of very short duration”. 



Frame number 


Figure 6. Variation of the Angle of Rotation around the X-Axis for the Joint “Right Foot” 

The above fuzzy sequential temporal sentence can be written formally as follows: 

W= (B_j rtf Dj) rtsn (B 0 rtf D 0 ) rtsn (B +1 rtfDj), 

where rtf is a fuzzy temporal relationship; rtsn is a temporal relationship of the direct sequence; B 0 is the 
term ‘Stabilization’, B_j is the term ‘Very slow decreasing’, B +1 is the term ‘Very slow increasing’ of the 
linguistic variable “Variation of rotation angle”; D 0 is the term ‘Zero’, D 1 is the term ‘Very short’ of the 
fuzzy temporal variable “Duration of event”. 


5. EVALUATION OF SIMILARITY BETWEEN THE IDENTIFIED AND 
ETALON MOVEMENTS 

In the model of fuzzy sequential temporal sentence, an adequacy of the analyzed fragment 8 q of a dynamic 
process and the corresponding attribute q are determined by the validity criterion /, which is represented as 
follows: 

J(q/S q ) = F q (SJ&p Lq (d q ). 

Here F q (S q ) is the characteristic function that establishes a semantic relationship between fuzzy values of 
the secondary attributes of a dynamic process and values of the primary attributes determining them; pLq(dq) 
is the membership function of the term L q of the linguistic variable L. 

The validity criterion of fuzzy sequential temporal sentence W with respect to any dynamic process S is 
written as 

J(W/S) = max IeV (J(W/Sh) t 
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where V is the set of all possible interpretations I. For instance, the validity criterion of fuzzy sequential 
temporal sentence W with respect to any dynamic process S for a set of fuzzy temporal events, which is 
expressed through successive attributes a, b, c , is described by the formula 

J(W/S)j = J(a/d a )&J(b/8 b )&J(c/d c ) = (F a (SJ&p La (SJ)&(F b (S b )^ 

In our case, the analyzed dynamic process is a sequence of frames in the skeleton vector model, which 
characterizes the rotation of one of the skeleton nodes around the axis X, Y or Z at a certain angle, and the 
criterion of validity is the criterion of similarity between the identified and etalon movements. So, the 
identified movements are considered as the well recognized with respect to the etalon movements if the value 
of criterion of similarity exceeds a predefined threshold. 

For example, calculate the criterion of similarity between the identified and etalon movements describing 
a rotation of the node “right ankle”. The etalon movements are presented by fuzzy temporal event, which is 
written as follows: “For the right ankle there is a very slow decreasing the angle of zero duration”. 

Let the initial data are the following time series: at the time moment t 0 the rotation angle p 0 - 10.00 
degrees; at the time moment t 2 the rotation angle p 2 =6.13 degrees. So, the duration of event is equal to 2 
frames. Then by the graph of membership function p D o(S t ) of the term D 0 ‘Zero’ of the fuzzy temporal 
variable “Duration of event” presented in Fig. 3, we find the value p DO (S t )=0.70 for S t =t 2 -t 0 =2 frames. By the 
graph of membership function p B -i(d P ) of the term B_ j ‘Very slow decreasing’ of the linguistic variable 
“Variation of rotation angle” presented in Fig. 4, we find the value ps.^d^-O^ for 
S p =p 2 -p 0 =6.13-10.00=-3.87 degrees. Thus, the criterion of similarity J(W/S)=0.92. If the threshold is equal 
to 0.80, then the identified rotation of the node “right ankle” is similar to the etalon movements. 

6. USE FOR TEACHING CHILDREN WITH HEARING DISABILITIES 

We use information about emotional reactions to control the education of children with hearing disabilities. 
Briefly describe another developed by us system. The system is aimed for recognition and translation in real 
time gestures of the Russian language of the deaf in the text and the text in gestures. The system is intended 
for training of children with hearing disabilities and adults who need to learn sign language. It will be used in 
a test mode in school for children with limited hearing. But already now receives positive reviews. 
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Problem use Kinect to recognize the small gestures of hands is still unresolved, despite the successful 
application of Kinect to recognizing faces and tracking of the human body. The main reason for this low 
resolution depth map sensor. 

In sign languages in communication, information is transmitted via several channels: directly through 
hand gestures, facial expressions, lip shape, position of the body and head. Hand gestures described via hand 
position, direction of movement, shape and direction of hands. The first stage of recognition is a 
segmentation of the image received from Kinect to find the hand or both hands. Development of a method for 
finding the hand in the picture is one of the most complicated problems in the process of creating a system of 
recognition of gestures. There are several signs that can be used to detect the object on the image: the 
appearance, shape, color, distance to the subject and context. When detecting faces in the image, a good sign 
is the appearance, as the eyes, nose and mouth are always about the same proportions. Therefore, to find 
hands, we first find the face of a man, define its color and highlight the closest object. Accept his hands. Next 
we apply the developed method for finding the hands and define user gesture. An example of the recognition 
the user's hand is shown in Fig. 7. 

The system works as follows. The user enters text. The system displays an animated image of the gesture. 
A sample output of the animated gesture is shown in Fig. 8. 



Figure 8. Animated Demonstration Gestures 

User-child repeats movement. Example showing the user gesture is shown in Fig. 9. 
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Figure 9. Show Gesture Language of the Deaf 
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The movement is recognized and checked for correctness. If not correct, the movement is shown again. If 
correct, then enter the new text. If the user starts to receive a closed posture characteristic of anger, 
resentment, it is informing the administrator and learning process can be stopped. 


7. CONCLUSION 

The identification of the human emotional states is closed to the problem of understanding what is the normal 
behavior. The variety of “normal” behavior is great. So it is difficult to draw the line between acceptable and 
unacceptable behavior. The automation of the human emotion recognition can help to solve many problems 
of relationships between people and avoid possible misunderstanding. 

Automated systems for the human emotion identification by gestures and movements can be useful and 
necessary in various areas such as communication of deaf people; education/learning; emergency services; 
monitoring unstable emotional state of pilots, drivers, operators of complex technical system, etc; monitoring 
public places to prevent illegal and extremist actions, and so on. 

In the future, we intend to use the developed approach to determine the emotional response by the 
handwritten text, and to animate human gestures and motions described in a limited natural language. 
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